Chapter 15
Introducing Correlation and Regression
IN THIS CHAPTER
Getting a handle on correlation analysis
Understanding the many kinds of regression analysis
Correlation, regression, curve-fitting, model-building — these terms all describe a set of general
statistical techniques that deal with the relationships among variables. Introductory statistics courses
usually present only the simplest form of correlation and regression, equivalent to fitting a straight line
to a set of data. But in the real world, correlations and regressions are seldom that simple — statistical
problems may involve more than two variables, and the relationship among them can be quite
complicated.
The words correlation and regression are often used interchangeably, but they refer to two
different concepts:
Correlation refers to the strength and direction of the relationship between two variables, or
among a group of variables.
Regression refers to a set of techniques for describing how the values of a variable or a group of
variables may cause, predict, or be associated with the values of another variable.
You can study correlation and regression for many years and not master all of it. In this chapter, we
cover the kinds of correlation and regression most often encountered in biological research and
explain the differences between them. We also explain some terminology used throughout Parts 5 and
6.
Correlation: Estimating How Strongly Two
Variables Are Associated
Correlation refers to the extent to which two variables are related. In the following sections, we
describe the Pearson correlation coefficient and discuss ways to analyze correlation coefficients.
Lining up the Pearson correlation coefficient
The Pearson correlation coefficient is represented by the symbol r and measures the extent to
which two variables (X and Y) tend to lie along a straight line when graphed. If the variables have